{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 正则表达式(Regular Expression常简写为regex 或者 RE)并不是Python所特有的而是属于整个计算机科学的一个概念，其主要是用来识别、分割、检索、替换某种具有特定格式的文本而产生的。目前正则表达式在众多计算机语言(Python,Java,C++等)中有着广泛的应用,另外写爬虫程序而言正则表达式也是必不可少的。\n",
    "\n",
    "## 正则表达式的使用方法：\n",
    "* 使用match方法进行匹配：\n",
    "match()方法用于从字符串的开始出进行匹配，如果在其实位置匹配成功，则返回一个match对象，否则返回None。语法格式如下\n",
    "> re.match(pattern，string，[flags])\n",
    "> 参数说明如下：\n",
    "> pattern: 表示模版，即进行字符串匹配的格式，由正则表达式转化而来。\n",
    "> string: 表示要匹配的字符串。\n",
    "> flags: 可选参数，用于限定匹配方式，如是否要区分大小写\n",
    "* 使用search方法进行匹配：\n",
    "search方法用于在整个字符串中搜索第一个匹配值，如果在其实位置匹配成功则返回Match()对象，否则None。语法格式如下\n",
    "> re.search(pattern，string，[flags])\n",
    "> 参数作用同上\n",
    "* 使用findall()方法进行匹配：\n",
    "findall()方法用于在整个字符串中搜索所有符合正则表达式的字符串，并且以列表的形式返回。如果匹配成功则返回包含匹配结构的列表，否则返回空列表，其语法格式如下：\n",
    "> re.findall(pattern，string，[flags])\n",
    "* 使用sub()方法替换字符串：\n",
    "> re.sub(pattern，repl，string，count，[flags])\n",
    "> 其参数说明如下：\n",
    "> pattern：替换模板，由模式字符串转换而来\n",
    "> repl:   表示将原始字符串替换成的字符串\n",
    "> string: 表示原始字符串\n",
    "> count: 表示匹配后替换的最大次数，默认值为零\n",
    "* 使用split()方法分割字符串\n",
    "> re.split(pattern，string，[maxsplit]，[flags])\n",
    "> maxsplit即最大拆分次数\n",
    "\n",
    "以上方法中flags参数的使用：\n",
    "\n",
    "|   标志   |                 说明                  |\n",
    "| :------: | :-----------------------------------: |\n",
    "| A或ASCII |           只进行ASCCII匹配            |\n",
    "|    I     |           不区分大小写匹配            |\n",
    "|    M     |   将行定位符用于整个字符串的每一行    |\n",
    "|    X     | 忽略掉模式字符串中未转义的空格和注释  |\n",
    "|    S     | 使用‘’.‘’字符匹配所有字符，包括换行符 |\n",
    "\n",
    "## 正则表达式的语法规则：\n",
    "\n",
    "各种元字符的作用如下：\n",
    "\n",
    "| 字符 |           说明           |\n",
    "| :--: | :----------------------: |\n",
    "|  .   | 匹配换行符以外的任意字符 |\n",
    "|  \\w  | 匹配字母数字下划线或汉字 |\n",
    "|  \\s  |      匹配任意空白符      |\n",
    "|  \\d  |         匹配数字         |\n",
    "|  \\b  |   匹配单词的开始或结束   |\n",
    "|  ^   |  行定位符匹配字符串开始  |\n",
    "|  $   |  行定位符匹配字符串结尾  |\n",
    "\n",
    "各种匹配限定符如下：\n",
    "\n",
    "| 限定符 |             说明             |            例子             |\n",
    "| :----: | :--------------------------: | :-------------------------: |\n",
    "|   ？   |   匹配前面的字符一次或零次   | 如lov?e可以匹配love或者loe  |\n",
    "|   +    |   匹配前面的字符一次或多次   |                             |\n",
    "|   *    |   匹配前面的字符零次或多次   |                             |\n",
    "|  {n}   |      匹配前面的字符n次       | 如teac{1}her只能匹配teacher |\n",
    "|  {n,}  |    匹配前面的字符至少n次     |                             |\n",
    "| {n,m}  | 匹配前面的字符至少n次至多m次 |                             |\n",
    "\n",
    "使用元字符和限定符进行匹配是很简单的事情但是如果要像不使用元字符和限定符进行匹配也是可以的只需要在[]里面加上要匹配的字符就可以了，但是还是有语法规则如下：\n",
    "\n",
    "> 匹配任意一位数字：[0-9]\n",
    ">\n",
    "> 匹配任意一个字母：[a-zA-Z]\n",
    ">\n",
    "> 匹配任意的字母数字下划线：[a-z0-9A-Z_]等同于\\w\n",
    ">\n",
    "> 排除字符: [\\^a-zA-Z] 加上括号之后就是排除字母\n",
    ">\n",
    "> 选择字符：all the counts 身份证号码的长度为15位或者是十八位，如果要匹配其可用如下表达式\n",
    ">\n",
    "> ​                   (\\^\\d{15}\\$)|(\\^\\d{18}\\$)|(\\^\\d{17}\\$)|(\\^\\d|x|X$)\n",
    ">\n",
    "> 转义字符：正则表达式中的转义字符\\的用法在各种语言中都大同小异都是把特殊字符变成普通字符，\n",
    ">\n",
    "> ​                  比如匹配一个IP地址 127.0.0.1\n",
    ">\n",
    "> ​                  如果用表达式[1-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1.3}着显然不对因为.是转义字符有特殊含义  \n",
    ">\n",
    "> ​                  必须在其前面加上转义字符才可以[1-9]{1,3}\\\\.[0-9]{1,3}\\\\.[0-9]{1,3} \\\\.[0-9]{1.3}，\n",
    ">\n",
    "> 分组：(four|thir)th这个就不仅仅是匹配括号中的four和thir这么简单，这个表达式的意思是匹配fourth，\n",
    ">\n",
    "> ​          或者thirth\n",
    ">\n",
    "> 重复操作：([0-9]){3}，这是对小括号内的表达式进行重复操作3次             \n",
    "\n",
    "Python中正则表达式的语法：\n",
    "\n",
    "在python中是将正则表达式当作模式字符串来使用的如下：\n",
    "\n",
    "> ‘[a-zA-Z]’\n",
    "\n",
    "而且字符串必须是写成原生字符串的形式\n",
    "\n",
    "> ‘\\b\\w\\b’ \n",
    "\n",
    "上面的写法是错误的正确的如下\n",
    "\n",
    "> ‘\\\\\\\\b\\\\\\\\w\\\\\\\\b’\n",
    "\n",
    "如果在表达式中还有很多的反斜杠和特殊字符则建议采用如下：\n",
    "\n",
    "> r’\\bm\\w*\\b‘\n",
    ">\n",
    "> 意思是把模式字符串变成原生字符串，建议都加r处理\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "luzekun1609301\n"
     ]
    }
   ],
   "source": [
    "# 拼接字符串\n",
    "string1 = 'luzekun'\n",
    "string2 = '1609301'\n",
    "print(string1+string2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7\n"
     ]
    }
   ],
   "source": [
    "#计算字符串长度\n",
    "print(len(string1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "lzk\n"
     ]
    }
   ],
   "source": [
    "# 截取字符串\n",
    "print(string1[:5:2]) #截取从第一个字符开始到第五个字符为止以步长为二"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 分割字符串：\n",
    "`str.split(sep,maxsplit)`\n",
    "参数解释：\n",
    "1.sep：用于指定分隔符,若不指定则默认空格分隔符\n",
    "\n",
    "2.maxsplit:用于指定最大分割次数,默认无限\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['luozekun 517926120', 'qq.com 1609301']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string = 'luozekun 517926120@qq.com 1609301'\n",
    "string.split('@')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">字符串分割之后做成了一个列表"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 检索字符串:\n",
    "count方法用于检索字符串中是否包含某个子字符串且返回子符串出现的次数\n",
    "`str.count(sub,start,end)`\n",
    "find方法用于检索字符串中是否包含某个子字符串且返回子字符串首字符首次出现的索引\n",
    "`str.find(sub,start,end)`\n",
    "index方法与find方法类似，不过find方法在没有发现检索字符串时返回-1,index则抛出异常\n",
    "参数说明:\n",
    "1.sub:表示检索对象\n",
    "\n",
    "2.start:表示检索其实位置\n",
    "\n",
    "3.end:表示检索结尾位置"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "luozekun 517926120@qq.com 1609301\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(string)\n",
    "string.count('1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "15"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string.find('120')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 判断字符串的开始和结尾：\n",
    "1.`str.startwith(prefix,start,end)`\n",
    "\n",
    "2.`str.endwith(prefix,start,end)`\n",
    "\n",
    "参数说明:\n",
    "\n",
    "prefix:子字符串\n",
    "\n",
    "start:起始位置\n",
    "\n",
    "end:结尾位置"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 字母大小写转换:\n",
    "\n",
    "`string.lower()`:大写转小写\n",
    "\n",
    "`string.upper()`:小写转大写"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 除去字符串中空格字符和特殊字符\n",
    "\n",
    "`str.strip([chars])`:去除左右两端空格和特殊字符(\\t,\\r,\\n)\n",
    "\n",
    "`str.lstrip([chars])`:去除左边空格和特殊字符\n",
    "\n",
    "`str.rstrip([chars])`:去除右端空格和特殊字符\n",
    "\n",
    "参数说明:chars为指定去除的字符"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "原始字符串:  321898190 @\t\n",
      "21\n",
      "变化字符串:321898190 @21\n"
     ]
    }
   ],
   "source": [
    "string = '  321898190 @\\t\\r\\n'\n",
    "print('原始字符串:'+ string + '21')\n",
    "print('变化字符串:'+ string.strip()+'21')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 格式化字符串\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [Anacond1]",
   "language": "python",
   "name": "Python [Anacond1]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
